|

1.

Network-based visualisation of frequent sequences.

Bántay, László; Abonyi, János.

PLoS One ; 19(5): e0301262, 2024.

Article En | MEDLINE | ID: mdl-38722864

Frequent sequence pattern mining is an excellent tool to discover patterns in event chains. In complex systems, events from parallel processes are present, often without proper labelling. To identify the groups of events related to the subprocess, frequent sequential pattern mining can be applied. Since most algorithms provide too many frequent sequences that make it difficult to interpret the results, it is necessary to post-process the resulting frequent patterns. The available visualisation techniques do not allow easy access to multiple properties that support a faster and better understanding of the event scenarios. To answer this issue, our work proposes an intuitive and interactive solution to support this task, introducing three novel network-based sequence visualisation methods that can reduce the time of information processing from a cognitive perspective. The proposed visualisation methods offer a more information rich and easily understandable interpretation of sequential pattern mining results compared to the usual text-like outcome of pattern mining algorithms. The first uses the confidence values of the transitions to create a weighted network, while the second enriches the adjacency matrix based on the confidence values with similarities of the transitive nodes. The enriched matrix enables a similarity-based Multidimensional Scaling (MDS) projection of the sequences. The third method uses similarity measurement based on the overlap of the occurrences of the supporting events of the sequences. The applicability of the method is presented in an industrial alarm management problem and in the analysis of clickstreams of a website. The method was fully implemented in Python environment. The results show that the proposed methods are highly applicable for the interactive processing of frequent sequences, supporting the exploration of the inner mechanisms of complex systems.

Algorithms , Data Mining/methods , Humans

2.

A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering.

Zheng, Xinghai; Lim, Peng Ken; Mutwil, Marek; Wang, Yuefei.

BMC Plant Biol ; 24(1): 373, 2024 May 08.

Article En | MEDLINE | ID: mdl-38714965

BACKGROUND: As one of the world's most important beverage crops, tea plants (Camellia sinensis) are renowned for their unique flavors and numerous beneficial secondary metabolites, attracting researchers to investigate the formation of tea quality. With the increasing availability of transcriptome data on tea plants in public databases, conducting large-scale co-expression analyses has become feasible to meet the demand for functional characterization of tea plant genes. However, as the multidimensional noise increases, larger-scale co-expression analyses are not always effective. Analyzing a subset of samples generated by effectively downsampling and reorganizing the global sample set often leads to more accurate results in co-expression analysis. Meanwhile, global-based co-expression analyses are more likely to overlook condition-specific gene interactions, which may be more important and worthy of exploration and research. RESULTS: Here, we employed the k-means clustering method to organize and classify the global samples of tea plants, resulting in clustered samples. Metadata annotations were then performed on these clustered samples to determine the "conditions" represented by each cluster. Subsequently, we conducted gene co-expression network analysis (WGCNA) separately on the global samples and the clustered samples, resulting in global modules and cluster-specific modules. Comparative analyses of global modules and cluster-specific modules have demonstrated that cluster-specific modules exhibit higher accuracy in co-expression analysis. To measure the degree of condition specificity of genes within condition-specific clusters, we introduced the correlation difference value (CDV). By incorporating the CDV into co-expression analyses, we can assess the condition specificity of genes. This approach proved instrumental in identifying a series of high CDV transcription factor encoding genes upregulated during sustained cold treatment in Camellia sinensis leaves and buds, and pinpointing a pair of genes that participate in the antioxidant defense system of tea plants under sustained cold stress. CONCLUSIONS: To summarize, downsampling and reorganizing the sample set improved the accuracy of co-expression analysis. Cluster-specific modules were more accurate in capturing condition-specific gene interactions. The introduction of CDV allowed for the assessment of condition specificity in gene co-expression analyses. Using this approach, we identified a series of high CDV transcription factor encoding genes related to sustained cold stress in Camellia sinensis. This study highlights the importance of considering condition specificity in co-expression analysis and provides insights into the regulation of the cold stress in Camellia sinensis.

Camellia sinensis , Camellia sinensis/genetics , Camellia sinensis/metabolism , Cluster Analysis , Genes, Plant , Gene Expression Profiling/methods , Data Mining/methods , Transcriptome , Gene Expression Regulation, Plant , Gene Regulatory Networks

3.

Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus.

Idris, Nur Farahaina; Ismail, Mohd Arfian; Jaya, Mohd Izham Mohd; Ibrahim, Ashraf Osman; Abulfaraj, Anas W; Binzagr, Faisal.

PLoS One ; 19(5): e0302595, 2024.

Article En | MEDLINE | ID: mdl-38718024

Diabetes Mellitus is one of the oldest diseases known to humankind, dating back to ancient Egypt. The disease is a chronic metabolic disorder that heavily burdens healthcare providers worldwide due to the steady increment of patients yearly. Worryingly, diabetes affects not only the aging population but also children. It is prevalent to control this problem, as diabetes can lead to many health complications. As evolution happens, humankind starts integrating computer technology with the healthcare system. The utilization of artificial intelligence assists healthcare to be more efficient in diagnosing diabetes patients, better healthcare delivery, and more patient eccentric. Among the advanced data mining techniques in artificial intelligence, stacking is among the most prominent methods applied in the diabetes domain. Hence, this study opts to investigate the potential of stacking ensembles. The aim of this study is to reduce the high complexity inherent in stacking, as this problem contributes to longer training time and reduces the outliers in the diabetes data to improve the classification performance. In addressing this concern, a novel machine learning method called the Stacking Recursive Feature Elimination-Isolation Forest was introduced for diabetes prediction. The application of stacking with Recursive Feature Elimination is to design an efficient model for diabetes diagnosis while using fewer features as resources. This method also incorporates the utilization of Isolation Forest as an outlier removal method. The study uses accuracy, precision, recall, F1 measure, training time, and standard deviation metrics to identify the classification performances. The proposed method acquired an accuracy of 79.077% for PIMA Indians Diabetes and 97.446% for the Diabetes Prediction dataset, outperforming many existing methods and demonstrating effectiveness in the diabetes domain.

Diabetes Mellitus , Machine Learning , Humans , Diabetes Mellitus/diagnosis , Algorithms , Data Mining/methods , Support Vector Machine , Male

4.

Demystifying COVID-19 mortality causes with interpretable data mining.

Qian, Xinyu; Zuo, Zhihong; Xu, Danni; He, Shanyun; Zhou, Conghao; Wang, Zhanwen; Xie, Shucai; Zhang, Yongmin; Wu, Fan; Lyu, Feng; Zhang, Lina; Qian, Zhaoxin.

Sci Rep ; 14(1): 10076, 2024 05 02.

Article En | MEDLINE | ID: mdl-38698064

While COVID-19 becomes periodical, old individuals remain vulnerable to severe disease with high mortality. Although there have been some studies on revealing different risk factors affecting the death of COVID-19 patients, researchers rarely provide a comprehensive analysis to reveal the relationships and interactive effects of the risk factors of COVID-19 mortality, especially in the elderly. Through retrospectively including 1917 COVID-19 patients (102 were dead) admitted to Xiangya Hospital from December 2022 to March 2023, we used the association rule mining method to identify the risk factors leading causes of death among the elderly. Firstly, we used the Affinity Propagation clustering to extract key features from the dataset. Then, we applied the Apriori Algorithm to obtain 6 groups of abnormal feature combinations with significant increments in mortality rate. The results showed a relationship between the number of abnormal feature combinations and mortality rates within different groups. Patients with "C-reactive protein > 8 mg/L", "neutrophils percentage > 75.0 %", "lymphocytes percentage < 20%", and "albumin < 40 g/L" have a 2 × mortality rate than the basic one. When the characteristics of "D-dimer > 0.5 mg/L" and "WBC > 9.5 × 10 9 /L" are continuously included in this foundation, the mortality rate can be increased to 3 × or 4 × . In addition, we also found that liver and kidney diseases significantly affect patient mortality, and the mortality rate can be as high as 100%. These findings can support auxiliary diagnosis and treatment to facilitate early intervention in patients, thereby reducing patient mortality.

COVID-19 , Data Mining , Humans , COVID-19/mortality , Aged , Male , Female , Retrospective Studies , Middle Aged , Risk Factors , SARS-CoV-2/isolation & purification , Aged, 80 and over , Algorithms

5.

Mining Real-World Big Data to Characterize Adverse Drug Reaction Quantitatively: Mixed Methods Study.

Yue, Qi-Xuan; Ding, Ruo-Fan; Chen, Wei-Hao; Wu, Lv-Ying; Liu, Ke; Ji, Zhi-Liang.

J Med Internet Res ; 26: e48572, 2024 May 03.

Article En | MEDLINE | ID: mdl-38700923

BACKGROUND: Adverse drug reactions (ADRs), which are the phenotypic manifestations of clinical drug toxicity in humans, are a major concern in precision clinical medicine. A comprehensive evaluation of ADRs is helpful for unbiased supervision of marketed drugs and for discovering new drugs with high success rates. OBJECTIVE: In current practice, drug safety evaluation is often oversimplified to the occurrence or nonoccurrence of ADRs. Given the limitations of current qualitative methods, there is an urgent need for a quantitative evaluation model to improve pharmacovigilance and the accurate assessment of drug safety. METHODS: In this study, we developed a mathematical model, namely the Adverse Drug Reaction Classification System (ADReCS) severity-grading model, for the quantitative characterization of ADR severity, a crucial feature for evaluating the impact of ADRs on human health. The model was constructed by mining millions of real-world historical adverse drug event reports. A new parameter called Severity_score was introduced to measure the severity of ADRs, and upper and lower score boundaries were determined for 5 severity grades. RESULTS: The ADReCS severity-grading model exhibited excellent consistency (99.22%) with the expert-grading system, the Common Terminology Criteria for Adverse Events. Hence, we graded the severity of 6277 standard ADRs for 129,407 drug-ADR pairs. Moreover, we calculated the occurrence rates of 6272 distinct ADRs for 127,763 drug-ADR pairs in large patient populations by mining real-world medication prescriptions. With the quantitative features, we demonstrated example applications in systematically elucidating ADR mechanisms and thereby discovered a list of drugs with improper dosages. CONCLUSIONS: In summary, this study represents the first comprehensive determination of both ADR severity grades and ADR frequencies. This endeavor establishes a strong foundation for future artificial intelligence applications in discovering new drugs with high efficacy and low toxicity. It also heralds a paradigm shift in clinical toxicity research, moving from qualitative description to quantitative evaluation.

Big Data , Data Mining , Drug-Related Side Effects and Adverse Reactions , Humans , Data Mining/methods , Pharmacovigilance , Models, Theoretical , Adverse Drug Reaction Reporting Systems/statistics & numerical data

6.

Potential mechanisms of traditional Chinese medicine in treating insomnia: A network pharmacology, GEO validation, and molecular-docking study.

Liu, Xing; Sun, Pengcheng; Bao, Xuejie; Cao, Yanqi; Wang, Liying; Wang, Qi.

Medicine (Baltimore) ; 103(18): e38052, 2024 May 03.

Article En | MEDLINE | ID: mdl-38701256

The purpose of this study is to investigate the potential mechanisms of Chinese herbs for the treatment of insomnia using a combination of data mining, network pharmacology, and molecular-docking validation. All the prescriptions for insomnia treated by the academician Qi Wang from 2020 to 2022 were collected. The Ancient and Modern Medical Case Cloud Platform v2.3 was used to identify high-frequency Chinese medicinal herbs and the core prescription. The Traditional Chinese Medicine Systems Pharmacology and UniProt databases were utilized to predict the effective active components and targets of the core herbs. Insomnia-related targets were collected from 4 databases. The intersecting targets were utilized to build a protein-protein interaction network and conduct gene ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis using the STRING database, Cytoscape software, and clusterProfiler package. Gene chip data (GSE208668) were obtained from the Gene Expression Omnibus database. The limma package was applied to identify differentially expressed genes (DEGs) between insomnia patients and healthy controls. To create a "transcription factor (TF)-miRNA-mRNA" network, the differentially expressed miRNAs were entered into the TransmiR, FunRich, Targetscan, and miRDB databases. Subsequently, the overlapping targets were validated using the DEGs, and further validations were conducted through molecular docking and molecular dynamics simulations. Among the 117 prescriptions, 65 herbs and a core prescription were identified. Network pharmacology and bioinformatics analysis revealed that active components such as ß-sitosterol, stigmasterol, and canadine acted on hub targets, including interleukin-6, caspase-3, and hypoxia-inducible factor-1α. In GSE208668, 6417 DEGs and 7 differentially expressed miRNAs were identified. A "TF-miRNA-mRNA" network was constructed by 4 "TF-miRNA" interaction pairs and 66 "miRNA-mRNA" interaction pairs. Downstream mRNAs exert therapeutic effects on insomnia by regulating circadian rhythm. Molecular-docking analyses demonstrated good docking between core components and hub targets. Molecular dynamics simulation displayed the strong stability of the complex formed by small molecule and target. The core prescription by the academician Qi Wang for treating insomnia, which involves multiple components, targets, and pathways, showed the potential to improve sleep, providing a basis for clinical treatment of insomnia.

Drugs, Chinese Herbal , Medicine, Chinese Traditional , MicroRNAs , Molecular Docking Simulation , Network Pharmacology , Protein Interaction Maps , Sleep Initiation and Maintenance Disorders , Sleep Initiation and Maintenance Disorders/drug therapy , Sleep Initiation and Maintenance Disorders/genetics , Humans , Drugs, Chinese Herbal/therapeutic use , Drugs, Chinese Herbal/pharmacology , Medicine, Chinese Traditional/methods , Gene Regulatory Networks/drug effects , RNA, Messenger/metabolism , RNA, Messenger/genetics , Data Mining , Transcription Factors/genetics

7.

Identification of pattern mining algorithm for rugby league players positional groups separation based on movement patterns.

Adeyemo, Victor Elijah; Palczewska, Anna; Jones, Ben; Weaving, Dan.

PLoS One ; 19(5): e0301608, 2024.

Article En | MEDLINE | ID: mdl-38691555

The application of pattern mining algorithms to extract movement patterns from sports big data can improve training specificity by facilitating a more granular evaluation of movement. Since movement patterns can only occur as consecutive, non-consecutive, or non-sequential, this study aimed to identify the best set of movement patterns for player movement profiling in professional rugby league and quantify the similarity among distinct movement patterns. Three pattern mining algorithms (l-length Closed Contiguous [LCCspm], Longest Common Subsequence [LCS] and AprioriClose) were used to extract patterns to profile elite rugby football league hookers (n = 22 players) and wingers (n = 28 players) match-games movements across 319 matches. Jaccard similarity score was used to quantify the similarity between algorithms' movement patterns and machine learning classification modelling identified the best algorithm's movement patterns to separate playing positions. LCCspm and LCS movement patterns shared a 0.19 Jaccard similarity score. AprioriClose movement patterns shared no significant Jaccard similarity with LCCspm (0.008) and LCS (0.009) patterns. The closed contiguous movement patterns profiled by LCCspm best-separated players into playing positions. Multi-layered Perceptron classification algorithm achieved the highest accuracy of 91.02% and precision, recall and F1 scores of 0.91 respectively. Therefore, we recommend the extraction of closed contiguous (consecutive) over non-consecutive and non-sequential movement patterns for separating groups of players.

Algorithms , Football , Movement , Humans , Football/physiology , Movement/physiology , Athletic Performance/physiology , Male , Machine Learning , Athletes , Data Mining/methods , Adult , Rugby

8.

Paving the way for COVID survivors' psychosocial rehabilitation: Mining topics, sentiments, and their trajectories over time from Reddit.

Farokhnia Hamedani, Moez; Esmaeili, Mostafa; Sun, Yao; Sheybani, Ehsan; Javidi, Giti.

Health Informatics J ; 30(2): 14604582241240680, 2024.

Article En | MEDLINE | ID: mdl-38739488

Objective: This study examined major themes and sentiments and their trajectories and interactions over time using subcategories of Reddit data. The aim was to facilitate decision-making for psychosocial rehabilitation. Materials and Methods: We utilized natural language processing techniques, including topic modeling and sentiment analysis, on a dataset consisting of more than 38,000 topics, comments, and posts collected from a subreddit dedicated to the experiences of people who tested positive for COVID-19. In this longitudinal exploratory analysis, we studied the dynamics between the most dominant topics and subjects' emotional states over an 18-month period. Results: Our findings highlight the evolution of the textual and sentimental status of major topics discussed by COVID survivors over an extended period of time during the pandemic. We particularly studied pre- and post-vaccination eras as a turning point in the timeline of the pandemic. The results show that not only does the relevance of topics change over time, but the emotions attached to them also vary. Major social events, such as the administration of vaccines or enforcement of nationwide policies, are also reflected through the discussions and inquiries of social media users. In particular, the emotional state (i.e., sentiments and polarity of their feelings) of those who have experienced COVID personally. Discussion: Cumulative societal knowledge regarding the COVID-19 pandemic impacts the patterns with which people discuss their experiences, concerns, and opinions. The subjects' emotional state with respect to different topics was also impacted by extraneous factors and events, such as vaccination. Conclusion: By mining major topics, sentiments, and trajectories demonstrated in COVID-19 survivors' interactions on Reddit, this study contributes to the emerging body of scholarship on COVID-19 survivors' mental health outcomes, providing insights into the design of mental health support and rehabilitation services for COVID-19 survivors.

COVID-19 , SARS-CoV-2 , Survivors , Humans , COVID-19/psychology , COVID-19/epidemiology , Survivors/psychology , Data Mining/methods , Pandemics , Natural Language Processing , Social Media/trends , Longitudinal Studies

9.

The Association of Malnutrition and Health-Related Factors among 474,467 Older Community-Dwellers: A Population-Based Data Mining Study in Guangzhou, China.

Lin, Wei-Quan; Xiao, Ting; Fang, Ying-Ying; Sun, Min-Ying; Yang, Yun-Ou; Chen, Jia-Min; Ou, Chun-Quan; Liu, Hui.

Nutrients ; 16(9)2024 Apr 29.

Article En | MEDLINE | ID: mdl-38732585

BACKGROUND: This study aimed to examine the prevalence and associated factors of malnutrition in older community-dwellers and explore the interaction between associated factors. METHODS: A total of 474,467 older community-dwellers aged 65 or above were selected in Guangzhou, China. We used a two-step methodology to detect the associated factors of malnutrition and constructed logistic regression models to explore the influencing factors and interactive effects on three patterns of malnutrition. RESULTS: The prevalence of malnutrition was 22.28%. Older adults with both hypertension and diabetes (RERI = 0.13), both meat or fish diet and hypertension (RERI = 0.79), and both meat or fish diet and diabetes (RERI = 0.81) had positive additive interaction effects on the risk of obesity, whereas those on a vegetarian diet with hypertension (RERI = -0.25) or diabetes (RERI = -0.19) had negative additive interaction effects. Moreover, the interactions of physical activity with a meat or fish diet (RERI = -0.84) or dyslipidemia (RERI = -0.09) could lower the risk of obesity. CONCLUSIONS: Malnutrition was influenced by different health factors, and there were interactions between these influencing factors. Pertinent dietary instruction should be given according to different nutritional status indexes and the prevalence of metabolic diseases to avoid the occurrences of malnutrition among older adults.

Data Mining , Hypertension , Malnutrition , Humans , Aged , China/epidemiology , Male , Female , Malnutrition/epidemiology , Prevalence , Hypertension/epidemiology , Risk Factors , Aged, 80 and over , Independent Living , Nutritional Status , Diabetes Mellitus/epidemiology , Obesity/epidemiology , Diet , Exercise , Logistic Models , Dyslipidemias/epidemiology

10.

Identifying the Effect of Cognitive Motivation with the Method Based on Temporal Association Rule Mining Concept.

Phukhachee, Tustanah; Maneewongvatana, Suthathip; Chaiyanan, Chayapol; Iramina, Keiji; Kaewkamnerdpong, Boonserm.

Sensors (Basel) ; 24(9)2024 Apr 30.

Article En | MEDLINE | ID: mdl-38732962

Being motivated has positive influences on task performance. However, motivation could result from various motives that affect different parts of the brain. Analyzing the motivation effect from all affected areas requires a high number of EEG electrodes, resulting in high cost, inflexibility, and burden to users. In various real-world applications, only the motivation effect is required for performance evaluation regardless of the motive. Analyzing the relationships between the motivation-affected brain areas associated with the task's performance could limit the required electrodes. This study introduced a method to identify the cognitive motivation effect with a reduced number of EEG electrodes. The temporal association rule mining (TARM) concept was used to analyze the relationships between attention and memorization brain areas under the effect of motivation from the cognitive motivation task. For accuracy improvement, the artificial bee colony (ABC) algorithm was applied with the central limit theorem (CLT) concept to optimize the TARM parameters. From the results, our method can identify the motivation effect with only FCz and P3 electrodes, with 74.5% classification accuracy on average with individual tests.

Algorithms , Cognition , Electroencephalography , Motivation , Motivation/physiology , Electroencephalography/methods , Humans , Cognition/physiology , Male , Adult , Female , Brain/physiology , Young Adult , Electrodes , Data Mining/methods

11.

Mine the past: How to make better risk-based decisions and improve outcomes with historical threat data.

Pickren, Ann.

J Bus Contin Emer Plan ; 17(4): 351-362, 2024 Jan 01.

Article En | MEDLINE | ID: mdl-38736162

The impact of every crisis has the potential to cascade throughout an organisation's operations, supply chain and market ecosystem. To properly understand and mitigate this ripple of dynamic risk, business continuity, security and risk management leaders need to know where to focus their attention. Looking at historical threat data provides a clearer picture of the risk landscape, helping leaders better anticipate and plan for the future. To date, however, there have been challenges in this process. As the volume of data about critical events continues to grow at an alarming rate, sifting manually through data puts organisations - and business continuity - in jeopardy. This paper discusses the value of historical threat data and innovations in data-mining technology that can unlock the true power of historical data for informed, strategic decision-making and better outcomes during a crisis.

Data Mining , Disaster Planning , Risk Management , Humans , Disaster Planning/organization & administration , Risk Management/organization & administration , Risk Assessment , Decision Making , Commerce/organization & administration

12.

Assessing the needs of patients with breast cancer and their families across various treatment phases using a Latent Dirichlet Allocation model: a text-mining approach to online health communities.

Da, Chaojin; Duan, Yiwen; Ji, Zhenying; Chen, Jialin; Xia, Haozhi; Weng, Yajuan; Zhou, Tingting; Yuan, Changrong; Cai, Tingting.

Support Care Cancer ; 32(5): 314, 2024 Apr 29.

Article En | MEDLINE | ID: mdl-38683417

PURPOSE: This study aimed to assess the different needs of patients with breast cancer and their families in online health communities at different treatment phases using a Latent Dirichlet Allocation (LDA) model. METHODS: Using Python, breast cancer-related posts were collected from two online health communities: patient-to-patient and patient-to-doctor. After data cleaning, eligible posts were categorized based on the treatment phase. Subsequently, an LDA model identifying the distinct need-related topics for each phase of treatment, including data preprocessing and LDA topic modeling, was established. Additionally, the demographic and interactive features of the posts were manually analyzed. RESULTS: We collected 84,043 posts, of which 9504 posts were included after data cleaning. Early diagnosis and rehabilitation treatment phases had the highest and lowest number of posts, respectively. LDA identified 11 topics: three in the initial diagnosis phase and two in each of the remaining treatment phases. The topics included disease outcomes, diagnosis analysis, treatment information, and emotional support in the initial diagnosis phase; surgical options and outcomes, postoperative care, and treatment planning in the perioperative treatment phase; treatment options and costs, side effects management, and disease prognosis assessment in the non-operative treatment phase; diagnosis and treatment options, disease prognosis, and emotional support in the relapse and metastasis treatment phase; and follow-up and recurrence concerns, physical symptoms, and lifestyle adjustments in the rehabilitation treatment phase. CONCLUSION: The needs of patients with breast cancer and their families differ across various phases of cancer therapy. Therefore, specific information or emotional assistance should be tailored to each phase of treatment based on the unique needs of patients and their families.

Breast Neoplasms , Data Mining , Humans , Breast Neoplasms/psychology , Breast Neoplasms/therapy , Breast Neoplasms/rehabilitation , Female , Data Mining/methods , Needs Assessment , Internet

13.

Mining and exploration of rehabilitation nursing targets for colorectal cancer.

Li, Ruipu; He, Jie; Ni, Zhijie; Zhang, Jie; Chi, Xiaoqian; Kang, Chunbo; Li, Zhongbo; Li, Xubin.

Aging (Albany NY) ; 16(8): 7022-7042, 2024 Apr 16.

Article En | MEDLINE | ID: mdl-38637125

BACKGROUND: There are often subtle early symptoms of colorectal cancer, a common malignancy of the intestinal tract. However, it is not yet clear how MYC and NCAPG2 are involved in colorectal cancer. METHOD: We obtained colorectal cancer datasets GSE32323 and GSE113513 from the Gene Expression Omnibus (GEO). After downloading, we identified differentially expressed genes (DEGs) and performed Weighted Gene Co-expression Network Analysis (WGCNA). We then undertook functional enrichment assay, gene set enrichment assay (GSEA) and immune infiltration assay. Protein-protein interaction (PPI) network construction and analysis were undertaken. Survival analysis and Comparative Toxicogenomics Database (CTD) analysis were conducted. A gene expression heat map was generated. We used TargetScan to identify miRNAs that are regulators of DEGs. RESULTS: 1117 DEGs were identified. Their predominant enrichment in activities like the cellular phase of the cell cycle, in cell proliferation, in nuclear and cytoplasmic localisation and in binding to protein-containing complexes was revealed by Gene Ontology (GO). When the enrichment data from GSE32323 and GSE113513 colon cancer datasets were merged, the primary enriched DEGs were linked to the cell cycle, protein complex, cell cycle control, calcium signalling and P53 signalling pathways. In particular, MYC, MAD2L1, CENPF, UBE2C, NUF2 and NCAPG2 were identified as highly expressed in colorectal cancer samples. Comparative Toxicogenomics Database (CTD) demonstrated that the core genes were implicated in the following processes: colorectal neoplasia, tumour cell transformation, inflammation and necrosis. CONCLUSIONS: High MYC and NCAPG2 expression has been observed in colorectal cancer, and increased MYC and NCAPG2 expression correlates with worse prognosis.

Colorectal Neoplasms , Gene Expression Regulation, Neoplastic , Protein Interaction Maps , Humans , Colorectal Neoplasms/genetics , Colorectal Neoplasms/pathology , Colorectal Neoplasms/metabolism , Gene Regulatory Networks , Databases, Genetic , MicroRNAs/genetics , MicroRNAs/metabolism , Data Mining , Gene Expression Profiling , Proto-Oncogene Proteins c-myc/metabolism , Proto-Oncogene Proteins c-myc/genetics , Biomarkers, Tumor/metabolism , Biomarkers, Tumor/genetics

14.

A realworld pharmacovigilance study of FDA adverse event reporting system events for daratumumab.

Yun, Xiaolin; Zhou, Yingying; Wu, Danna; Liu, Yuanbo; Wu, Qiongshi.

Expert Opin Drug Saf ; 23(5): 581-591, 2024 May.

Article En | MEDLINE | ID: mdl-38600747

BACKGROUND: Daratumumab, a first-in-class humanized IgG1κ monoclonal antibody that targets the CD38 epitope, has been approved for treatment of multiple myeloma by FDA. The current study was to evaluate daratumumab-related adverse events (AEs) through data mining of the US Food and Drug Administration Adverse Event Reporting System (FAERS). RESEARCH DESIGN AND METHODS: Disproportionality analyses, including the reporting odds ratio (ROR), the proportional reporting ratio (PRR), the Bayesian confidence propagation neural network (BCPNN) and the multi-item gamma Poisson shrinker (MGPS) algorithms were employed to quantify the signals of daratumumab-associated AEs. RESULTS: Out of 10,378,816 reports collected from the FAERS database, 8727 reports of daratumumab-associated AEs were identified. A total of 183 significant disproportionality preferred terms (PTs) were retained. Unexpected significant AEs such as meningitis aseptic, leukoencephalopathy, tumor lysis syndrome, disseminated intravascular coagulation, hyperviscosity syndrome, sudden hearing loss, ileus and diverticular perforation were also detected. The median onset time of daratumumab-related AEs was 11 days (interquartile range [IQR] 0-76 days), and most of the cases occurred within 30 days. CONCLUSION: Our study found potential new and unexpected AEs signals for daratumumab, suggesting prospective clinical studies are needed to confirm these results and illustrate their relationship.

Adverse Drug Reaction Reporting Systems , Antibodies, Monoclonal , Databases, Factual , Multiple Myeloma , Pharmacovigilance , United States Food and Drug Administration , Humans , Adverse Drug Reaction Reporting Systems/statistics & numerical data , United States , Antibodies, Monoclonal/adverse effects , Antibodies, Monoclonal/administration & dosage , Multiple Myeloma/drug therapy , Male , Female , Middle Aged , Aged , Data Mining , Antineoplastic Agents/adverse effects , Antineoplastic Agents/administration & dosage , Adult , Algorithms

15.

Identifying and overcoming COVID-19 vaccination impediments using Bayesian data mining techniques.

Lei, Bowen; Mahajan, Arvind; Mallick, Bani.

Sci Rep ; 14(1): 8595, 2024 04 13.

Article En | MEDLINE | ID: mdl-38615084

The COVID-19 pandemic has profoundly reshaped human life. The development of COVID-19 vaccines has offered a semblance of normalcy. However, obstacles to vaccination have led to substantial loss of life and economic burdens. In this study, we analyze data from a prominent health insurance provider in the United States to uncover the underlying reasons behind the inability, refusal, or hesitancy to receive vaccinations. Our research proposes a methodology for pinpointing affected population groups and suggests strategies to mitigate vaccination barriers and hesitations. Furthermore, we estimate potential cost savings resulting from the implementation of these strategies. To achieve our objectives, we employed Bayesian data mining methods to streamline data dimensions and identify significant variables (features) influencing vaccination decisions. Comparative analysis reveals that the Bayesian method outperforms cutting-edge alternatives, demonstrating superior performance.

COVID-19 , Humans , Bayes Theorem , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines , Pandemics , Data Mining , Vaccination

16.

Text mining of hypertension researches in the west Asia region: a 12-year trend analysis.

Rezapour, Mohammad; Yazdinejad, Mohsen; Rajabi Kouchi, Faezeh; Habibi Baghi, Masoomeh; Khorrami, Zahra; Khavanin Zadeh, Morteza; Pourbaghi, Elmira; Rezapour, Hassan.

Ren Fail ; 46(1): 2337285, 2024 Dec.

Article En | MEDLINE | ID: mdl-38616180

More than half of the world population lives in Asia and hypertension (HTN) is the most prevalent risk factor found in Asia. There are numerous articles published about HTN in Eastern Mediterranean Region (EMRO) and artificial intelligence (AI) methods can analyze articles and extract top trends in each country. Present analysis uses Latent Dirichlet allocation (LDA) as an algorithm of topic modeling (TM) in text mining, to obtain subjective topic-word distribution from the 2790 studies over the EMRO. The period of checked studied is last 12 years and results of LDA analyses show that HTN researches published in EMRO discuss on changes in BP and the factors affecting it. Among the countries in the region, most of these articles are related to I.R Iran and Egypt, which have an increasing trend from 2017 to 2018 and reached the highest level in 2021. Meanwhile, Iraq and Lebanon have been conducting research since 2010. The EMRO word cloud illustrates 'BMI', 'mortality', 'age', and 'meal', which represent important indicators, dangerous outcomes of high BP, and gender of HTN patients in EMRO, respectively.

Artificial Intelligence , Hypertension , Humans , Data Mining , Algorithms , Asia/epidemiology , Hypertension/epidemiology

17.

[Element relationship and extension path of clinical evidence knowledge map with Chinese patent medicine].

Ji, Zhao-Chen; Hu, Hai-Yin; Peng, De-Hui; Wang, Dan-Lei; Wu, Xiao-Lei; Feng, Chao-Nan; Zhang, Jun-Hua.

Zhongguo Zhong Yao Za Zhi ; 49(3): 836-841, 2024 Feb.

Article Zh | MEDLINE | ID: mdl-38621887

This study aims to construct the element relationship and extension path of clinical evidence knowledge map with Chinese patent medicine, providing basic technical support for the formation and transformation of the evidence chain of Chinese patent medicine and providing collection, induction, and summary schemes for massive and disorganized clinical data. Based on the elements of evidence-based PICOS, the conventional construction methods of knowledge graph were collected and summarized. Firstly, the data entities related to Chinese patent medicine were classified, and entity linking was performed(disambiguation). Secondly, the study associated and classified the attribute information of the data entity. Finally, the logical relationship between entities was constructed, and then the element relationship and extension path of the knowledge map conforming to the characteristics of clinical evidence of Chinese patent medicine were summarized. The construction of the clinical evidence knowledge map of Chinese patent medicine was mainly based on process design and logical structure, and the element relationship of the knowledge map was expressed according to the PICOS principle and evidence level. The extension path crossed three levels(model layer, data layer application, and new evidence application), and the study gradually explored the path from disease, core evaluation indicators, Chinese patent medicine, core prescriptions, syndrome and treatment rules, and medical case comparison(evolution law) to new drug research and development. In this study, the top-level design of the construction of the clinical evidence knowledge map of Chinese patent medicine has been clarified, but it still needs the joint efforts of interdisciplinary disciplines. With the continuous improvement of the map construction technology in line with the characteristics of TCM, the study can provide necessary basic technical support and reference for the development of the TCM discipline.

Drugs, Chinese Herbal , Drugs, Chinese Herbal/therapeutic use , Medicine, Chinese Traditional , Nonprescription Drugs/therapeutic use , Technology , Data Mining/methods

18.

Analysis of heavy metal and polycyclic aromatic hydrocarbon pollution characteristics of a typical metal rolling industrial site based on data mining.

Li, De'an; Deng, Yirong; Liu, LiLi; Wang, Jun; Huang, Zaoquan; Zhang, Xiaolu.

Environ Geochem Health ; 46(5): 146, 2024 Apr 05.

Article En | MEDLINE | ID: mdl-38578375

With the transformation and upgrading of industries, the environmental problems caused by industrial residual contaminated sites are becoming increasingly prominent. Based on actual investigation cases, this study analyzed the soil pollution status of a remaining sites of the copper and zinc rolling industry, and found that the pollutants exceeding the screening values included Cu, Ni, Zn, Pb, total petroleum hydrocarbons and 6 polycyclic aromatic hydrocarbon monomers. Based on traditional analysis methods such as the correlation coefficient and spatial distribution, combined with machine learning methods such as SOM + K-means, it is inferred that the heavy metal Zn/Pb may be mainly related to the production history of zinc rolling. Cu/Ni may be mainly originated from the production history of copper rolling. PAHs are mainly due to the incomplete combustion of fossil fuels in the melting equipment. TPH pollution is speculated to be related to oil leakage during the industrial use period and later period of vehicle parking. The results showed that traditional analysis methods can quickly identify the correlation between site pollutants, while SOM + K-means machine learning methods can further effectively extract complex hidden relationships in data and achieve in-depth mining of site monitoring data.

Environmental Pollutants , Metals, Heavy , Polycyclic Aromatic Hydrocarbons , Soil Pollutants , Copper/analysis , Polycyclic Aromatic Hydrocarbons/analysis , Lead/analysis , Soil Pollutants/analysis , Metals, Heavy/analysis , Zinc/analysis , Environmental Pollution/analysis , Soil , Environmental Pollutants/analysis , Data Mining , Environmental Monitoring/methods , China , Risk Assessment

19.

Machine Evaluation of Catchment Area Relevance through Text Mining.

Arlen, Philip A; Chakko, Joseph; DeGennaro, Geoffrey; Kobetz, Erin; Mahal, Brandon.

Crit Rev Oncog ; 29(3): 1-4, 2024.

Article En | MEDLINE | ID: mdl-38683150

The University of Miami Sylvester Comprehensive Cancer Center Community Outreach and Engagement Office has developed an algorithm to aid in identifying catchment area relevant trials. We have developed this tool to capture a catchment area (South Florida) that represents the most racially, ethnically, and geographically diverse region in the US. Unfortunately, the area's tumor burden is also significant with many notable disparities, necessitating a prioritization of trials within Sylvester's catchment area. These trials address the needs of the population Sylvester serves by targeting cancers that are locally prevalent.

Data Mining , Humans , Algorithms , Catchment Area, Health , Florida/epidemiology , Machine Learning , Neoplasms/epidemiology , Neoplasms/diagnosis

20.

Text mining and portal development for gene-specific publications on Alzheimer's disease and other neurodegenerative diseases.

Liu, Jiannan; Wu, Huanmei; Robertson, Daniel H; Zhang, Jie.

BMC Med Inform Decis Mak ; 24(Suppl 3): 98, 2024 Apr 17.

Article En | MEDLINE | ID: mdl-38632621

BACKGROUND: Tremendous research efforts have been made in the Alzheimer's disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. Reviewing previous work and staying current on this ever-growing body of AD publications is an essential yet difficult task for AD researchers. METHODS: In this study, we designed and implemented a natural language processing (NLP) pipeline to extract gene-specific neurodegenerative disease (ND) -focused information from the PubMed database. The collected publication information was filtered and cleaned to construct AD-related gene-specific publication profiles. Six categories of AD-related information are extracted from the processed publication data: publication trend by year, dementia type occurrence, brain region occurrence, mouse model information, keywords occurrence, and co-occurring genes. A user-friendly web portal is then developed using Django framework to provide gene query functions and data visualizations for the generalized and summarized publication information. RESULTS: By implementing the NLP pipeline, we extracted gene-specific ND-related publication information from the abstracts of the publications in the PubMed database. The results are summarized and visualized through an interactive web query portal. Multiple visualization windows display the ND publication trends, mouse models used, dementia types, involved brain regions, keywords to major AD-related biological processes, and co-occurring genes. Direct links to PubMed sites are provided for all recorded publications on the query result page of the web portal. CONCLUSION: The resulting portal is a valuable tool and data source for quick querying and displaying AD publications tailored to users' interested research areas and gene targets, which is especially convenient for users without informatic mining skills. Our study will not only keep AD field researchers updated with the progress of AD research, assist them in conducting preliminary examinations efficiently, but also offers additional support for hypothesis generation and validation which will contribute significantly to the communication, dissemination, and progress of AD research.

Alzheimer Disease , Neurodegenerative Diseases , Animals , Mice , Data Mining/methods , PubMed , Databases, Factual